-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doc 302 new etl tutorial - part 1 #25320
base: master
Are you sure you want to change the base?
Conversation
I've been looking at pyproject.toml, setup.cfg and setup.py and thinking that could be pyproject.toml only for many projects. Especially for beginner level tutorials. |
@neverett I think the first portion of the tutorial is ready for your review. Once this section is good I can continue with the rest of the tutorial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think this is the right level of granularity and selection of topics for people looking to move beyond the Quickstart (esp. if they're somewhat experienced data engineers), and the pacing is good. I left fairly detailed feedback on pages 1 and 2, and high-level feedback for page 3 (asset dependencies and checks), since I think that one is worth splitting into two pages. Once you've taken another pass at the content, I'm happy to re-review and give more feedback on the downstream asset and asset checks content, and other content where it makes sense.
docs/docs-beta/docs/tutorial/03-asset-dependencies-and-checks.md
Outdated
Show resolved
Hide resolved
We added an “excludes” parameter to the `dagster project scaffold` command recently that could be used to create a project from the scaffold without tests for simplicity.
Thanks,
Daniel
________________________________
From: Alex Noonan ***@***.***>
Sent: Saturday, November 16, 2024 5:03:29 AM
To: dagster-io/dagster ***@***.***>
Cc: Daniel Bartley ***@***.***>; Comment ***@***.***>
Subject: Re: [dagster-io/dagster] Doc 302 new etl tutorial - part 1 (PR #25320)
@C00ldudeNoonan commented on this pull request.
________________________________
In docs/docs-beta/docs/tutorial/01-etl-tutorial-introduction.md<#25320 (comment)>:
+ pip install dagster dagster-webserver pandas dagster-duckdb
+ ```
+
+## Step 2: Copying Project Scaffold
+
+Next we will get the raw data for the project. As well as the project scaffold, Dagster has several pre-built scaffolds you can install depending on your use case. You can see the full up to date list by running. `dagster project list-examples`
+
+Use the project scaffold command for this project.
+ ```bash title="ETL Project Scaffold"
+ dagster project from-example --example getting_started_etl_tutorial
+ ```
+
+The project should have this structure.
+<!-- vale off -->
+```
+dagster-etl-tutorial/
For this tutorial we were trying to keep it as simple as possible by excluding the test folder and the other specific file we can have the user focus on the etl pipeline and getting familiar with the Dagster primitives. The last lesson in the tutorial will also be on refactoring the project into seperate files for assets, resources, etc.
—
Reply to this email directly, view it on GitHub<#25320 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADLXPTNXW25YNTH3QLTHWSL2AYZPDAVCNFSM6AAAAABQCI6DRCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDIMZZGMYTANRQGU>.
You are receiving this because you commented.Message ID: ***@***.***>
|
As an outsider, the three config files seems excessive. Dagster cloud seems to have a hard dependency on setuptools though.
Thanks,
Daniel
________________________________
From: Alex Noonan ***@***.***>
Sent: Saturday, November 16, 2024 5:10:30 AM
To: dagster-io/dagster ***@***.***>
Cc: Daniel Bartley ***@***.***>; Comment ***@***.***>
Subject: Re: [dagster-io/dagster] Doc 302 new etl tutorial - part 1 (PR #25320)
@C00ldudeNoonan commented on this pull request.
________________________________
In docs/docs-beta/docs/tutorial/01-etl-tutorial-introduction.md<#25320 (comment)>:
+### File/Directory Descriptions
+
+#### Dagster files
+
+- **etl_tutorial/**: This is a Python module that contains your Dagster code. It is the main directory where you will define your assets, jobs, schedules, sensors, and resources.
+
+ - **definitions.py**: This file is typically used to define jobs, schedules, and sensors. It organizes the various components of your Dagster project. This allows Dagster to load the definitions in a module.
+
+#### Python files
+
+- **pyproject.toml**: This file is used to specify build system requirements and package metadata for Python projects. It is part of the Python packaging ecosystem.
+
+- **setup.cfg**: This file is used for configuration of your Python package. It can include metadata about the package, dependencies, and other configuration options.
+
+- **setup.py**: This script is used to build and distribute your Python package. It is a standard file in Python projects for specifying package details.
+
just added!
—
Reply to this email directly, view it on GitHub<#25320 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADLXPTO6GAT2BHZYLRRH3PD2AY2JNAVCNFSM6AAAAABQCI6DRCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDIMZZGMZDEOJTGE>.
You are receiving this because you commented.Message ID: ***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took another pass through and left some more feedback, let me know what you think! Happy to re-review as you add the remaining pages.
@neverett I added the remaining pages, feel free to review them when you get a chance |
Summary & Motivation
I'm a little way into this and would like to get feedback from @PedramNavid and @cmpadden on the structure and general flow. This isn't done at this point, but I figure we could collaborate here and iterate from there.
I made some changes to the reference file to make it more concise regarding metadata output. The new code example function works great.
Main Questions I have at this point:
How I Tested These Changes
Changelog